Efficient Learning in Large-Scale Combinatorial Semi-Bandits
ثبت نشده
چکیده
= ̃ O ⇣ K p dnmin {ln(L), d} ⌘ . (11) We now outline the proof of Theorem 3, which is based on (Russo & Van Roy, 2013; Dani et al., 2008). Let H t denote the “history” (i.e. all the available information) by the start of episode t. Note that from the Bayesian perspective, conditioning on H t , ✓⇤ and ✓ t are i.i.d. drawn from N( ̄ ✓ t ,⌃ t ) (see (Russo & Van Roy, 2013)). This is because that conditioning on H t , the posterior belief in ✓⇤ is N( ̄ ✓ t ,⌃ t ) and based on Algorithm 2, ✓ t is independently sampled from N( ̄ ✓ t ,⌃ t ). Since ORACLE is a fixed combinatorial optimization algorithm (even though it can be independently randomized), and E,A, are all fixed, then conditioning on H t , A⇤ and At are also i.i.d., furthermore, A⇤ is conditionally independent of ✓ t , and At is conditionally independent of ✓⇤. To simplify the exposition, 8✓ 2 R and 8A ✓ E, we define g(A, ✓) = X e2A h e , ✓i , (12) then we have E[f(A⇤,w t )|H t , ✓⇤, ✓ t , A⇤, At] = g(A⇤, ✓⇤) and E[f(A,w t )|H t , ✓⇤, ✓ t , A⇤, At] = g(At, ✓⇤), hence we have E[R t |H t ] = E[g(A⇤, ✓⇤) g(At, ✓⇤)|H t ]. We also define the upper confidence bound (UCB) function U t : 2 E ! R as U t (A) = X e2A ⌦
منابع مشابه
Efficient Learning in Large-Scale Combinatorial Semi-Bandits
• the agent knows a generalization matrix Φ ∈ <L×d s.t. w̄ = EP [wt] is “close” to span[Φ] • such models are available in many cases Performance Metrics At each time t, choosing At ∈ A can be challenging, since the combinatorial optimization problem maxA∈A ∑ e∈A w(e) can be NP-hard. We assume the agent uses a combinatorial optimization algorithm ORACLE to choose At, where ORACLE can be an approx...
متن کاملTight Regret Bounds for Stochastic Combinatorial Semi-Bandits
A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we an...
متن کاملMatroid Bandits: Practical Large-Scale Combinatorial Bandits
A matroid is a notion of independence that is closely related to computational efficiency in combinatorial optimization. In this work, we bring together the ideas of matroids and multiarmed bandits, and propose a new class of stochastic combinatorial bandits, matroid bandits. A key characteristic of this class is that matroid bandits can be solved both computationally and sample efficiently. We...
متن کاملImportance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits
We propose a sample-efficient alternative for importance weighting for situations where one only has sample access to the probability distribution that generates the observations. Our new method, called Recurrence Weighting (RW), is described and analyzed in the context of online combinatorial optimization under semi-bandit feedback, where a learner sequentially selects its actions from a combi...
متن کاملSemi-Bandits with Knapsacks
We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks and combinatorial semi-bandits. The former concerns limited “resources” consumed by the algorithm, e.g., limited supply in dynamic pricing. The latter allows a huge number of actions but assumes combinatorial structure and additional feedback to make the problem tractable. We define a common generalization, supp...
متن کاملOnline Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback
We study a stochastic online problem of learning to influence in a social network with semi-bandit feedback, individual observations of how influenced users influence others. Our problem combines challenges of partial monitoring, because the learning agent only observes the influenced portion of the network, and combinatorial bandits, because the cardinality of the feasible set is exponential i...
متن کامل